Overview

Dataset statistics

Number of variables10
Number of observations20640
Missing cells207
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.7 MiB
Average record size in memory138.1 B

Variable types

NUM9
CAT1

Warnings

latitude is highly correlated with longitudeHigh correlation
longitude is highly correlated with latitudeHigh correlation
total_bedrooms is highly correlated with total_rooms and 1 other fieldsHigh correlation
total_rooms is highly correlated with total_bedrooms and 1 other fieldsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
population is highly correlated with householdsHigh correlation
total_bedrooms has 207 (1.0%) missing values Missing

Reproduction

Analysis started2020-11-17 16:11:21.633214
Analysis finished2020-11-17 16:11:44.614343
Duration22.98 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

longitude
Real number (ℝ)

HIGH CORRELATION

Distinct844
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-119.5697045
Minimum-124.35
Maximum-114.31
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2020-11-17T16:11:44.800206image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum-124.35
5-th percentile-122.47
Q1-121.8
median-118.49
Q3-118.01
95-th percentile-117.08
Maximum-114.31
Range10.04
Interquartile range (IQR)3.79

Descriptive statistics

Standard deviation2.003531724
Coefficient of variation (CV)-0.01675618195
Kurtosis-1.330152366
Mean-119.5697045
Median Absolute Deviation (MAD)1.28
Skewness-0.297801208
Sum-2467918.7
Variance4.014139367
MonotocityNot monotonic
2020-11-17T16:11:45.015317image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
-118.311620.8%
 
-118.31600.8%
 
-118.291480.7%
 
-118.271440.7%
 
-118.321420.7%
 
-118.281410.7%
 
-118.351400.7%
 
-118.361380.7%
 
-118.191350.7%
 
-118.251280.6%
 
-118.371280.6%
 
-118.21260.6%
 
-118.141250.6%
 
-118.261210.6%
 
-118.131210.6%
 
-118.181200.6%
 
-118.341190.6%
 
-118.211180.6%
 
-118.151160.6%
 
-118.121120.5%
 
-118.11090.5%
 
-118.381070.5%
 
-118.171060.5%
 
-118.431060.5%
 
-118.161030.5%
 
Other values (819)1746584.6%
 
ValueCountFrequency (%) 
-124.351< 0.1%
 
-124.32< 0.1%
 
-124.271< 0.1%
 
-124.261< 0.1%
 
-124.251< 0.1%
 
-124.233< 0.1%
 
-124.221< 0.1%
 
-124.213< 0.1%
 
-124.194< 0.1%
 
-124.186< 0.1%
 
ValueCountFrequency (%) 
-114.311< 0.1%
 
-114.471< 0.1%
 
-114.491< 0.1%
 
-114.551< 0.1%
 
-114.561< 0.1%
 
-114.573< 0.1%
 
-114.582< 0.1%
 
-114.592< 0.1%
 
-114.63< 0.1%
 
-114.613< 0.1%
 

latitude
Real number (ℝ≥0)

HIGH CORRELATION

Distinct862
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.63186143
Minimum32.54
Maximum41.95
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2020-11-17T16:11:45.224320image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum32.54
5-th percentile32.82
Q133.93
median34.26
Q337.71
95-th percentile38.96
Maximum41.95
Range9.41
Interquartile range (IQR)3.78

Descriptive statistics

Standard deviation2.135952397
Coefficient of variation (CV)0.05994501302
Kurtosis-1.117759781
Mean35.63186143
Median Absolute Deviation (MAD)1.23
Skewness0.4659530037
Sum735441.62
Variance4.562292644
MonotocityNot monotonic
2020-11-17T16:11:45.432287image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
34.062441.2%
 
34.052361.1%
 
34.082341.1%
 
34.072311.1%
 
34.042211.1%
 
34.092121.0%
 
34.022081.0%
 
34.12031.0%
 
34.031930.9%
 
33.931810.9%
 
33.941750.8%
 
33.971720.8%
 
33.991680.8%
 
33.881640.8%
 
34.111620.8%
 
33.981620.8%
 
34.161590.8%
 
34.121580.8%
 
34.151570.8%
 
34.011560.8%
 
33.891540.7%
 
34.171540.7%
 
34.141520.7%
 
33.91520.7%
 
341520.7%
 
Other values (837)1608077.9%
 
ValueCountFrequency (%) 
32.541< 0.1%
 
32.553< 0.1%
 
32.5610< 0.1%
 
32.57180.1%
 
32.58260.1%
 
32.59110.1%
 
32.69< 0.1%
 
32.61140.1%
 
32.62130.1%
 
32.63180.1%
 
ValueCountFrequency (%) 
41.952< 0.1%
 
41.921< 0.1%
 
41.881< 0.1%
 
41.863< 0.1%
 
41.841< 0.1%
 
41.821< 0.1%
 
41.812< 0.1%
 
41.83< 0.1%
 
41.791< 0.1%
 
41.783< 0.1%
 

housing_median_age
Real number (ℝ≥0)

Distinct52
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.63948643
Minimum1
Maximum52
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2020-11-17T16:11:45.625316image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile8
Q118
median29
Q337
95-th percentile52
Maximum52
Range51
Interquartile range (IQR)19

Descriptive statistics

Standard deviation12.58555761
Coefficient of variation (CV)0.4394477408
Kurtosis-0.8006288536
Mean28.63948643
Median Absolute Deviation (MAD)10
Skewness0.0603306376
Sum591119
Variance158.3962604
MonotocityNot monotonic
2020-11-17T16:11:45.865388image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
5212736.2%
 
368624.2%
 
358244.0%
 
167713.7%
 
176983.4%
 
346893.3%
 
266193.0%
 
336153.0%
 
185702.8%
 
255662.7%
 
325652.7%
 
375372.6%
 
155122.5%
 
195022.4%
 
274882.4%
 
244782.3%
 
304762.3%
 
284712.3%
 
204652.3%
 
294612.2%
 
314582.2%
 
234482.2%
 
214462.2%
 
144122.0%
 
223991.9%
 
Other values (27)603529.2%
 
ValueCountFrequency (%) 
14< 0.1%
 
2580.3%
 
3620.3%
 
41910.9%
 
52441.2%
 
61600.8%
 
71750.8%
 
82061.0%
 
92051.0%
 
102641.3%
 
ValueCountFrequency (%) 
5212736.2%
 
51480.2%
 
501360.7%
 
491340.6%
 
481770.9%
 
471981.0%
 
462451.2%
 
452941.4%
 
443561.7%
 
433531.7%
 

total_rooms
Real number (ℝ≥0)

HIGH CORRELATION

Distinct5926
Distinct (%)28.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2635.763081
Minimum2
Maximum39320
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2020-11-17T16:11:46.127321image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile620.95
Q11447.75
median2127
Q33148
95-th percentile6213.2
Maximum39320
Range39318
Interquartile range (IQR)1700.25

Descriptive statistics

Standard deviation2181.615252
Coefficient of variation (CV)0.8276977802
Kurtosis32.630927
Mean2635.763081
Median Absolute Deviation (MAD)797
Skewness4.147343451
Sum54402150
Variance4759445.106
MonotocityNot monotonic
2020-11-17T16:11:46.353317image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1527180.1%
 
1582170.1%
 
1613170.1%
 
2127160.1%
 
2053150.1%
 
1607150.1%
 
1471150.1%
 
1717150.1%
 
1703150.1%
 
1722150.1%
 
1787140.1%
 
1705140.1%
 
1880140.1%
 
1724140.1%
 
1745140.1%
 
1562140.1%
 
1650140.1%
 
1743140.1%
 
1731140.1%
 
1759130.1%
 
2228130.1%
 
1808130.1%
 
1170130.1%
 
1283130.1%
 
1649130.1%
 
Other values (5901)2027898.2%
 
ValueCountFrequency (%) 
21< 0.1%
 
61< 0.1%
 
81< 0.1%
 
111< 0.1%
 
121< 0.1%
 
152< 0.1%
 
161< 0.1%
 
184< 0.1%
 
192< 0.1%
 
202< 0.1%
 
ValueCountFrequency (%) 
393201< 0.1%
 
379371< 0.1%
 
326271< 0.1%
 
320541< 0.1%
 
304501< 0.1%
 
304051< 0.1%
 
304011< 0.1%
 
282581< 0.1%
 
278701< 0.1%
 
277001< 0.1%
 

total_bedrooms
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct1923
Distinct (%)9.4%
Missing207
Missing (%)1.0%
Infinite0
Infinite (%)0.0%
Mean537.8705525
Minimum1
Maximum6445
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2020-11-17T16:11:46.591747image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile137
Q1296
median435
Q3647
95-th percentile1275.4
Maximum6445
Range6444
Interquartile range (IQR)351

Descriptive statistics

Standard deviation421.3850701
Coefficient of variation (CV)0.7834321252
Kurtosis21.98557506
Mean537.8705525
Median Absolute Deviation (MAD)162
Skewness3.459546332
Sum10990309
Variance177565.3773
MonotocityNot monotonic
2020-11-17T16:11:46.807763image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
280550.3%
 
331510.2%
 
345500.2%
 
393490.2%
 
343490.2%
 
394480.2%
 
328480.2%
 
348480.2%
 
272470.2%
 
309470.2%
 
295460.2%
 
314460.2%
 
322460.2%
 
399460.2%
 
317460.2%
 
284450.2%
 
388450.2%
 
290450.2%
 
291450.2%
 
346450.2%
 
287450.2%
 
340450.2%
 
313450.2%
 
269440.2%
 
460440.2%
 
Other values (1898)1926393.3%
 
(Missing)2071.0%
 
ValueCountFrequency (%) 
11< 0.1%
 
22< 0.1%
 
35< 0.1%
 
47< 0.1%
 
56< 0.1%
 
65< 0.1%
 
76< 0.1%
 
88< 0.1%
 
97< 0.1%
 
108< 0.1%
 
ValueCountFrequency (%) 
64451< 0.1%
 
62101< 0.1%
 
54711< 0.1%
 
54191< 0.1%
 
52901< 0.1%
 
50331< 0.1%
 
50271< 0.1%
 
49571< 0.1%
 
49521< 0.1%
 
48191< 0.1%
 

population
Real number (ℝ≥0)

HIGH CORRELATION

Distinct3888
Distinct (%)18.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1425.476744
Minimum3
Maximum35682
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2020-11-17T16:11:47.083756image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile348
Q1787
median1166
Q31725
95-th percentile3288
Maximum35682
Range35679
Interquartile range (IQR)938

Descriptive statistics

Standard deviation1132.462122
Coefficient of variation (CV)0.7944444737
Kurtosis73.55311639
Mean1425.476744
Median Absolute Deviation (MAD)440
Skewness4.935858227
Sum29421840
Variance1282470.457
MonotocityNot monotonic
2020-11-17T16:11:47.330747image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
891250.1%
 
1227240.1%
 
1052240.1%
 
761240.1%
 
850240.1%
 
825230.1%
 
999220.1%
 
1005220.1%
 
782220.1%
 
781210.1%
 
872210.1%
 
1098210.1%
 
753210.1%
 
926200.1%
 
1047200.1%
 
899200.1%
 
1203200.1%
 
735200.1%
 
1158200.1%
 
1155200.1%
 
986200.1%
 
861200.1%
 
1011200.1%
 
837200.1%
 
804200.1%
 
Other values (3863)2010697.4%
 
ValueCountFrequency (%) 
31< 0.1%
 
51< 0.1%
 
61< 0.1%
 
84< 0.1%
 
92< 0.1%
 
111< 0.1%
 
134< 0.1%
 
143< 0.1%
 
152< 0.1%
 
172< 0.1%
 
ValueCountFrequency (%) 
356821< 0.1%
 
285661< 0.1%
 
163051< 0.1%
 
161221< 0.1%
 
155071< 0.1%
 
150371< 0.1%
 
132511< 0.1%
 
128731< 0.1%
 
124271< 0.1%
 
122031< 0.1%
 

households
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1815
Distinct (%)8.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean499.5396802
Minimum1
Maximum6082
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2020-11-17T16:11:47.547779image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile125
Q1280
median409
Q3605
95-th percentile1162
Maximum6082
Range6081
Interquartile range (IQR)325

Descriptive statistics

Standard deviation382.3297528
Coefficient of variation (CV)0.7653641301
Kurtosis22.05798806
Mean499.5396802
Median Absolute Deviation (MAD)151
Skewness3.410437712
Sum10310499
Variance146176.0399
MonotocityNot monotonic
2020-11-17T16:11:47.748751image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
306570.3%
 
335560.3%
 
386560.3%
 
282550.3%
 
429540.3%
 
375530.3%
 
297510.2%
 
284510.2%
 
340500.2%
 
362500.2%
 
278500.2%
 
380500.2%
 
316490.2%
 
329490.2%
 
319490.2%
 
330490.2%
 
377480.2%
 
426480.2%
 
309480.2%
 
341480.2%
 
357470.2%
 
424460.2%
 
295460.2%
 
269460.2%
 
352460.2%
 
Other values (1790)1938893.9%
 
ValueCountFrequency (%) 
11< 0.1%
 
23< 0.1%
 
34< 0.1%
 
44< 0.1%
 
57< 0.1%
 
65< 0.1%
 
710< 0.1%
 
88< 0.1%
 
99< 0.1%
 
107< 0.1%
 
ValueCountFrequency (%) 
60821< 0.1%
 
53581< 0.1%
 
51891< 0.1%
 
50501< 0.1%
 
49301< 0.1%
 
48551< 0.1%
 
47691< 0.1%
 
46161< 0.1%
 
44901< 0.1%
 
43721< 0.1%
 

median_income
Real number (ℝ≥0)

Distinct12928
Distinct (%)62.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.870671003
Minimum0.4999
Maximum15.0001
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2020-11-17T16:11:47.967749image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0.4999
5-th percentile1.60057
Q12.5634
median3.5348
Q34.74325
95-th percentile7.300305
Maximum15.0001
Range14.5002
Interquartile range (IQR)2.17985

Descriptive statistics

Standard deviation1.899821718
Coefficient of variation (CV)0.4908249026
Kurtosis4.952524102
Mean3.870671003
Median Absolute Deviation (MAD)1.0642
Skewness1.646656702
Sum79890.6495
Variance3.60932256
MonotocityNot monotonic
2020-11-17T16:11:48.182751image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
3.125490.2%
 
15.0001490.2%
 
2.875460.2%
 
4.125440.2%
 
2.625440.2%
 
3.875410.2%
 
3380.2%
 
3.375380.2%
 
3.625370.2%
 
4370.2%
 
4.375350.2%
 
2.125330.2%
 
2.375320.2%
 
4.625310.2%
 
3.5300.1%
 
3.25290.1%
 
3.75290.1%
 
4.875290.1%
 
1.625290.1%
 
2.25290.1%
 
4.25280.1%
 
2.5280.1%
 
3.6875260.1%
 
2.75250.1%
 
4.5240.1%
 
Other values (12903)1978095.8%
 
ValueCountFrequency (%) 
0.4999120.1%
 
0.53610< 0.1%
 
0.54951< 0.1%
 
0.64331< 0.1%
 
0.67751< 0.1%
 
0.68251< 0.1%
 
0.68311< 0.1%
 
0.6961< 0.1%
 
0.69911< 0.1%
 
0.70071< 0.1%
 
ValueCountFrequency (%) 
15.0001490.2%
 
152< 0.1%
 
14.90091< 0.1%
 
14.58331< 0.1%
 
14.42191< 0.1%
 
14.41131< 0.1%
 
14.29591< 0.1%
 
14.28671< 0.1%
 
13.9471< 0.1%
 
13.85561< 0.1%
 

median_house_value
Real number (ℝ≥0)

Distinct3842
Distinct (%)18.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean206855.8169
Minimum14999
Maximum500001
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2020-11-17T16:11:48.414777image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum14999
5-th percentile66200
Q1119600
median179700
Q3264725
95-th percentile489810
Maximum500001
Range485002
Interquartile range (IQR)145125

Descriptive statistics

Standard deviation115395.6159
Coefficient of variation (CV)0.55785531
Kurtosis0.3278702429
Mean206855.8169
Median Absolute Deviation (MAD)68400
Skewness0.9777632739
Sum4269504061
Variance1.331614816e+10
MonotocityNot monotonic
2020-11-17T16:11:48.637747image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
5000019654.7%
 
1375001220.6%
 
1625001170.6%
 
1125001030.5%
 
187500930.5%
 
225000920.4%
 
350000790.4%
 
87500780.4%
 
275000650.3%
 
150000640.3%
 
175000630.3%
 
100000620.3%
 
125000560.3%
 
67500550.3%
 
250000470.2%
 
200000460.2%
 
118800390.2%
 
450000370.2%
 
156300350.2%
 
212500330.2%
 
181300310.2%
 
193800310.2%
 
300000300.1%
 
75000300.1%
 
55000290.1%
 
Other values (3817)1823888.4%
 
ValueCountFrequency (%) 
149994< 0.1%
 
175001< 0.1%
 
225004< 0.1%
 
250001< 0.1%
 
266001< 0.1%
 
269001< 0.1%
 
275001< 0.1%
 
283001< 0.1%
 
300002< 0.1%
 
325004< 0.1%
 
ValueCountFrequency (%) 
5000019654.7%
 
500000270.1%
 
4991001< 0.1%
 
4990001< 0.1%
 
4988001< 0.1%
 
4987001< 0.1%
 
4986001< 0.1%
 
4984001< 0.1%
 
4976001< 0.1%
 
4974001< 0.1%
 

ocean_proximity
Categorical

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size161.4 KiB
<1H OCEAN\
9136 
INLAND\
6550 
NEAR OCEAN\
2658 
NEAR BAY\
2290 
ISLAND\
 
5
ValueCountFrequency (%) 
<1H OCEAN\913644.3%
 
INLAND\655031.7%
 
NEAR OCEAN\265812.9%
 
NEAR BAY\229011.1%
 
ISLAND\5< 0.1%
 
INLAND}1< 0.1%
 
2020-11-17T16:11:48.826777image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique1 ?
Unique (%)< 0.1%
2020-11-17T16:11:48.941767image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:49.109747image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length11
Median length10
Mean length9.064922481
Min length7

Overview of Unicode Properties

Unique unicode characters18
Unique unicode categories6 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
N2984916.0%
 
A2558813.7%
 
\2063911.0%
 
E167428.9%
 
140847.5%
 
O117946.3%
 
C117946.3%
 
<91364.9%
 
191364.9%
 
H91364.9%
 
I65563.5%
 
L65563.5%
 
D65563.5%
 
R49482.6%
 
B22901.2%
 
Y22901.2%
 
S5< 0.1%
 
}1< 0.1%
 

Most occurring categories

ValueCountFrequency (%) 
Uppercase Letter13410471.7%
 
Other Punctuation2063911.0%
 
Space Separator140847.5%
 
Math Symbol91364.9%
 
Decimal Number91364.9%
 
Close Punctuation1< 0.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
N2984922.3%
 
A2558819.1%
 
E1674212.5%
 
O117948.8%
 
C117948.8%
 
H91366.8%
 
I65564.9%
 
L65564.9%
 
D65564.9%
 
R49483.7%
 
B22901.7%
 
Y22901.7%
 
S5< 0.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
14084100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
\20639100.0%
 

Most frequent Math Symbol characters

ValueCountFrequency (%) 
<9136100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
19136100.0%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
}1100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin13410471.7%
 
Common5299628.3%
 

Most frequent Latin characters

ValueCountFrequency (%) 
N2984922.3%
 
A2558819.1%
 
E1674212.5%
 
O117948.8%
 
C117948.8%
 
H91366.8%
 
I65564.9%
 
L65564.9%
 
D65564.9%
 
R49483.7%
 
B22901.7%
 
Y22901.7%
 
S5< 0.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
\2063938.9%
 
1408426.6%
 
<913617.2%
 
1913617.2%
 
}1< 0.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII187100100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
N2984916.0%
 
A2558813.7%
 
\2063911.0%
 
E167428.9%
 
140847.5%
 
O117946.3%
 
C117946.3%
 
<91364.9%
 
191364.9%
 
H91364.9%
 
I65563.5%
 
L65563.5%
 
D65563.5%
 
R49482.6%
 
B22901.2%
 
Y22901.2%
 
S5< 0.1%
 
}1< 0.1%
 

Interactions

2020-11-17T16:11:26.702357image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:26.872901image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:27.044906image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:27.211911image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:27.417929image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:27.604900image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:27.784932image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:27.972931image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:28.158900image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:28.347904image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:28.527917image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:28.690933image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:28.861916image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:29.038933image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:29.205932image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:29.398999image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:29.568986image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:29.762043image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:29.962993image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:30.162004image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:30.361003image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:30.550029image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:30.762989image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:30.978007image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:31.176991image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:31.384046image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:31.586049image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:31.776076image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:31.989048image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:32.187077image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:32.414050image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:32.617050image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:32.821078image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:33.028050image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:33.239045image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:33.451046image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:33.646050image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:33.827071image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:34.007049image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:34.186049image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:34.413047image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:34.618047image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:34.834049image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:35.054074image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:35.252076image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:35.454046image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:35.650075image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:35.828061image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:36.013047image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:36.220045image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:36.440049image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:36.661052image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:36.876052image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:37.093073image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:37.295057image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:37.482366image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:37.656373image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:37.843376image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:38.036366image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:38.230369image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:38.460370image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:38.651368image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:38.912375image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:39.164366image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:39.376369image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:39.569367image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:39.764384image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:39.960370image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:40.227382image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:40.453594image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:40.673595image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:40.868590image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:41.078816image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:41.257793image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:41.428365image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:41.607376image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:41.879395image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:42.050363image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:42.238370image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:42.492949image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:42.714577image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Correlations

2020-11-17T16:11:49.302752image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-11-17T16:11:49.571748image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-11-17T16:11:49.833776image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-11-17T16:11:50.147748image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-11-17T16:11:43.079092image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:44.168935image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-11-17T16:11:44.410494image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Sample

First rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
0-122.2337.8841880129.03221268.3252452600NEAR BAY\
1-122.2237.862170991106.0240111388.3014358500NEAR BAY\
2-122.2437.85521467190.04961777.2574352100NEAR BAY\
3-122.2537.85521274235.05582195.6431341300NEAR BAY\
4-122.2537.85521627280.05652593.8462342200NEAR BAY\
5-122.2537.8552919213.04131934.0368269700NEAR BAY\
6-122.2537.84522535489.010945143.6591299200NEAR BAY\
7-122.2537.84523104687.011576473.1200241400NEAR BAY\
8-122.2637.84422555665.012065952.0804226700NEAR BAY\
9-122.2537.84523549707.015517143.6912261100NEAR BAY\

Last rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
20630-121.3239.29112640505.012574453.5673112000INLAND\
20631-121.4039.33152655493.012004323.5179107200INLAND\
20632-121.4539.26152319416.010473853.1250115600INLAND\
20633-121.5339.19272080412.010823822.549598300INLAND\
20634-121.5639.27282332395.010413443.7125116800INLAND\
20635-121.0939.48251665374.08453301.560378100INLAND\
20636-121.2139.4918697150.03561142.556877100INLAND\
20637-121.2239.43172254485.010074331.700092300INLAND\
20638-121.3239.43181860409.07413491.867284700INLAND\
20639-121.2439.37162785616.013875302.388689400INLAND}